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Abstract 



In this paper, we present FASE (Faster Asynchronous Systems Evaluation), a tool for 
evaluating the worst-case efficiency of asynchronous systems. The tool is based on some 
well-established results in the setting of a timed process algebra (PAFAS: a Process Al- 
Q\ ■ gebra for Faster Asynchronous Systems). To show the applicability of FASE to concrete 

meaningful examples, we consider three implementations of a bounded buffer and use FASE 
to automatically evaluate their worst-case efficiency. We finally contrast our results with 
previous ones where the efficiency of the same implementations has already been considered. 



1 Introduction 



PAFAS [6] has been proposed as a useful tool for comparing the worst-case efficiency of asyn- 
chronous systems. It is a CCS-like process description language [TU] where basic actions are 
atomic and instantaneous but have associated a time bound interpreted as the maximal time 
delay for their execution. These upper time bounds can be used to evaluate efficiency, but they 
do not influence functionality (which actions are performed); so compared with CCS also PAFAS 
treats the full functionality of asynchronous systems. In |6], processes are compared via a vari- 
ant of the testing approach developed by De Nicola and Hennessy in [7j. Tests considered in [6J 
are test environments (as in [7]) together with a time bound. A process is embedded into the 
environment (via parallel composition) and satisfies a (timed) test, if success is reached before 
the time bound in every run of the composed system, i.e. even in the worst case. This gives 
rise to a faster-than preorder over processes that is naturally an efficiency preorder. Moreover, 
this efficiency preorder can be characterised as inclusion of a special kind of refusal traces, which 
provide decidability of the testing preorder for finite state processes. 



This work was supported by the PRIN Project 'Paco:Performability- Aware Computing: Logics, Models, and 
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In [I], it has been shown that the faster-than preorder provided in [6] can equivalently 
be defined on the basis of a performance function that gives the worst-case time needed to 
satisfy any test environment (or user behaviour). If the above timed testing scenario is adapted 
by considering only test environments that want n tasks to be performed as fast as possible 
(possibly in parallel), this performance function is asymptotically linear. This provides us with 
a quantitative measure of system performance, essentially a function from natural numbers to 
natural numbers called response performance function that measures how fast the system under 
consideration responds to requests from the environment. 

In this paper, we present FASE, a corresponding tool that supports the evaluation of this 
function for a given system. In order to show the applicability of FASE to concrete meaningful 
examples, we consider three different implementations of a bounded buffer and use FASE to 
automatically evaluate their efficiency. The three implementations are called Fifo, Pipe and Buff. 
Fifo is a bounded-length first-in-first-out queue, Pipe is a sequence of one place buffers connected 
end-to-end and Buff is an array used in a circular fashion. We prove that Fifo is always more 
efficient than Pipe and Buff, and that Buff is more efficient than Pipe only if the number of 
requests is sufficiently small w.r.t. the size of the buffer. These results are quite different from 
those presented in [3J (see Section [5]) where the efficiency of the same buffer implementations 
has been compared by means of the efficiency preorder defined in [5]. The reason is that here 
(as in j3]) we only consider a specific class of user behaviours. 

The rest of this paper is organised as follows. Section [2] recalls PAFAS and the technical 
details we need to define the response performance. Section [3] presents FASE and its main 
algorithms. Section H] describes the three buffer implementations and states our main results. 
Finally, Section [5] presents some concluding remarks. 

2 PAFAS 

In this section we briefly introduce PAFAS, its operational semantics and the performance func- 
tion to evaluate worst-case efficiency. We refer the reader to [B] and jl] for more details. We 
use the following notation: A is an infinite set of basic actions with a special action u, which is 
reserved for observers (test processes) in the testing scenario to signal the success of a test. The 
additional action r represents an internal activity that is unobservable from other components. 
Actions in A r = A U {r} (ranged over by a, (3, ■ ■ • ) can let time 1 pass before their execution, i.e. 
1 is their maximal delay. After that time, they become urgent actions. The set of urgent actions 
is A T = {a | a G A} U r} and it is ranged over by a, fl, ■ ■ ■ . Furthermore, X is the set of process 
variables x,y,z, . . . used for recursive definitions. A general relabelling function (incorporating 
relabelling and hiding) is a function $ : A T — > A r where the set {a G A T | ^ $ _1 (a) ^ {«}} is 
finite and $(r) = r. 

Definition 2.1 ( Timed Processes) The set P of ( timed) processes is the set of closed (i.e. without 
free variables) and guarded (i.e. variable x in a fix.P only appears within the scope of a prefix 
ct.Q, where a G A T ) terms generated by the following grammar: 

P::=0 | 7-P | P + P | P\\ A P | P[$] | x \ fix.P 

where 7 is a or a for some a G A r , $ a general relabelling function, x G X and AC A possibly 
infinite. 
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A brief description of our operators now follows. is the Nil-process, which cannot perform 
any action, but may let time pass without limit 0; a.P and a.P is (action-) prefixing, known 
from CCS. In particular, process a.P performs a with a maximal delay of 1; hence, it can either 
perform a immediately, or can idle for time 1 and become a.P. In the latter case, the idle-time 
has elapsed and action a must either occur or be deactivated (in a choice-context) before time 
may pass further. Our processes are patient as a stand-alone process, a.P has no reason to wait; 
but as a component in a.P\\^ a ya.Q, it has to wait for synchronisation on a and this can take up 
to time 1, since the component a.Q may idle this long. Pi + P2 models the choice between two 
conflicting processes Pi and Pi. P\\\aPi is the TCSP-like parallel composition of two processes 
Pi and P2 that run in parallel and have to synchronise on all actions from A [2]. In the following 
we write || as a shorthand for ||^. r , . P[$] behaves as P but with the actions changed according 
to $. Finally, fix.P models a recursive definition; recursive equations are a common way of 
defining processes. 

We now define the refusal traces of a process P. Intuitively, a refusal trace records, along a 
computation, which actions P can perform (P — >> P', a G A T ) and which actions P can refuse to 

perform (P — > r P', X C A). A transition like P — > r P' is called a (conditional) time step. The 
actions in the set X are not urgent (see rule Pref r 2 in Fig. [T]) so P is justified in not performing 
them but performing a time step instead. Since other actions might be urgent and cannot be 
refused, P as a stand-alone-process might actually be unable to let time pass. But if P is a 
component of a larger system, these actions might be further delayed due to synchronisation 
with some other components, and a time step is possible. Whenever P can make a time step in 
any context (i.e if P — > r P' and X = A), we say that P performs a full time step and also write 
P^P'. 

Definition 2.2 [Refusal operational semantics) The SOS-rules in Fig. [1] (plus symmetric rules 
for Par a i and Sum a for actions of P2) define the transition relations — >>C (P x P) for a G A T 

and A r C (P x P) for X C A. 

The rules in Fig. [1] explain the operational semantics of PAFAS processes. A process like 
a.P can either perform action a immediately and then become P (rule PREF ai ), or can let time 
1 pass and refuse any set of actions (rule PREF r i). A process a.P can perform an action a (rule 
PREF a2 ) and on its own cannot delay such an execution (rule PREF r2 ). Since internal action 
r has never to be synchronised, a process prefixed by an urgent r cannot make a time step. 
Another rule worth noting is PAR r that defines which actions a parallel composition can refuse 
during a time step. The intuition is that Pi||^P2 can refuse an action a if either a (jL A (Pi, P 2 
are not forced to synchronise on a) and both Pi, P 2 can refuse a, or a G A (Pi, P 2 are forced to 
synchronise on a) and either Pi or P 2 can refuse a. The other rules are as expected. 

For sequences w G (A T U 2 )*, we define P —$ r P' as expected: P ^ r P' if either w = e (the 

empty sequence) and P' = P or there is Q G P and \i G (A T U 2^) such that P A r Q P' 
and w = fiw'. Similarly, we define P — > P' for w G (A T U {1})*. In the latter case, ((w) is 
the duration of w, i.e. the number of full time steps in w. We write P ^ r P' (P P 1 ) if 
P — > r P' (P — > P', resp.) and v = w/t (v is the sequence w with all r's removed). Finally, 
RT(P) = {w I P =^ r } and DL(P) = {w\P =^} are the sets of refusal traces and discrete traces 
(resp.) of P. 

: A trailing will often be omitted, so e.g. a.b + c abbreviates a.b.O + c.O. 
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Figure 1: The Refusal Operational Semantics of PAFAS processes. 

For processes P , Q G P, RT(P) C RT(Q) implies DL(P) C DL(Q): DL(P) corresponds to the 
set of traces w G RT(P) where X = A for all refusal sets X in u>. Finally, the refusal transition 
system RTS(P) of P is defined as the set of all transitions Q A r Q' with /x G A T or /i C A 
where Q is reachable from P via such transitions. It is easy to prove that RTS(P ||^ Q) can be 
determined from RTS(P) and RTS(Q) according to the SOS-rules for parallel composition given 
in Fig. [TJ 

In the timed testing of [6j, P satisfies a timed test (observer O with special success action u> 
plus time bound D) if every discrete trace of P\\0 performs u before time D; P is faster than 
Qi P 3 Q, if P satisfies all timed tests that Q satisfies. This preorder is a qualitative notion 
since a timed test is either satisfied or not, and a process is more efficient than another or not. 

One of the main results in [6J is that the faster-than preorder can be characterised by refusal- 
trace-inclusion, i.e. P □ P' iff RT(P) C RT(P') (see Theorem 5.13 in [6J). A new formulation of 
this preorder has been provided in jl] (see Prp. 9) that brings to light its quantitative nature; 
the new formulation is given using the following performance function: 

In [1], Prop. 9 provides 

Definition 2.3 {Performance) Let P G P be a process and O G P be a test process. We define 
the performance function p as: 

p(P, O) = sup{ n G No I 3t> G DL(P||0) : ((v) = n and v does not contain u> } 
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If the right-hand side has no maximum, the supremum is oo. The performance function pp is 
defined by pp(0) = p(P, O), and we write P □ Q if pp(0) < Pq(0) for each O. 

The performance function p (as well as the preorder □) contrasts processes w.r.t. all pos- 
sible test environments. In some cases, this might be too demanding and one can make some 
reasonable assumption about the user behaviour. Consider a scenario where users have a num- 
ber of requests (made via m-actions) that they want to be answered (via out-actions) as fast 
as possible. This class of users is defined as U = {U n | n > 1} where JJ\ = in. out. u and 
U n = U n -\ in . out . oj (for any n > 1). Given these users, we can define the response perfor- 
mance rp of a testable process P as a function from N to No with rpp(n) = pp(U n ) = p(P, U n ); 
here n is the size (i.e. the number of requests) of the user. 

In what follows we briefly describe how the response performance of a process P can be cal- 
culated from its refusal transition system. We restrict attention to so-called response processes, 
which never produce an out without a corresponding preceding in. 

By Definition 12.31 to determine rpp(n) we have to consider all w G DL(P || U u ) that do not 
contain u, count the number of their full time steps and then take the supremum of the numbers 
so obtained. These traces are just paths in RTS(P || U u ) that do not contain u and contain 
only full time steps. These paths can have at most n in's and n out's (due to synchronisation 
with U n ). But after the n-th out, an urgent u becomes available and no more full time steps can 
occur before u; in other words, full time steps are only possible before the n-th out. So we have 
solely to consider paths in RTS(P || U n ) that contain only full time steps and have at most n 
in's and (n — 1) out's (and, hence, no a;). In I4J it has been proven that for each of these paths 
there is a so-called n-critical path in rRTS(pff with the same number of time steps. Thus, the 
following characterisation for the response performance can be given. 

Theorem 2.4 (Characterisation for response performance) A path in rRTS(P) is n-critical if it 
contains at most n in's, at most n — 1 out's , and all time steps before the n-th in are full. The 
response performance of a process P is the supremum of the numbers of time steps taken over 
all n-critical paths. 

Now a key observation is that, when the number n of requests is large compared to the 
number of processes in rRTS(P), an n-critical path with many time steps must contain cycles. 
Finding the worst cycles turns out to be essential for performance evaluation. In |3], these worst 
cycles are distinguished to be either catastrophic or bad cycles. 

Definition 2.5 (Catastrophic cycle) A cycle in rRTS(P) is a catastrophic cycle if it has a positive 
number of time steps but no in's and no out's. If rRTS(P) has a catastrophic cycle then 
rpp(n) = oo for some n. 

Intuitively, once in a catastrophic cycle, we cannot satisfy any other request (this is because 
a catastrophic cycle does not contain out-actions) but time can pass indefinitely (the cycle has 
at least one time step). As a consequence, there exists some n (depending on how many in and 
olfactions are performed on a path in rRTS(P) from P to this cycle) such that rpp(n) = oo, 
i.e. some user is not satisfied within a bounded time. If rRTS(P) is free from catastrophic cycles 
we search for the so called bad cycles: 



2 This is a reduced version of the RTS(P) where all conditional time steps, that cannot participate in a full 
time step when P runs in parallel with a user U n , are removed. For more details see [4]. 
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Definition 2.6 (Bad cycle) For P without catastrophic cycles, we consider cycles reached from 
P by a path where all time steps are full and which themselves contain only full time steps. Let 
the average performance of such a cycle be the ratio between the number of its full time steps 
and the number of its in actions. A bad cycle is a cycle in rRTS(P) which has maximal average 
performance. 

Theorem 16 in [4] shows that rpp is asymptotically linear, i.e. 3 a 6 K s.t. rpp(n) = 
an + 0(1), and that the "asymptotic performance" a of P is the average performance of a bad 
cycle. In other words, while n-critical paths give the exact value of the response performance of 
a process, the average performance of a bad cycle is its asymptotic behaviour. Both catastrophic 
and bad cycles can be automatically checked with FASE. 

3 Performance evaluation with FASE 

In this section we introduce FASeEI, the tool that has been used to automatically evaluate the 
worst-case efficiency of the three buffer implementations discussed in Section 0] FASE is written 
in Java language and consists of two main components. The former one is essentially a parser 
unit; it takes as input a sequence of characters that represents a PAFAS process P and builds 
its RTS(P). The second one is the performance module that implements the algorithms used to 
evaluate the worst-case efficiency of P. The two modules are loosely coupled; they communicate 
via a shared Java data structure or via an XML-based representation of the RTS. The last aspect 
is very important since changes to a module do not affect the other one; moreover, the XML 
interface guarantees a broader interoperability with external tools such as graph visualisers, 
which could be useful for further analysis of the modelled systems. 

3.0.1 Parsing unit 

Fig. |5] shows on top the parsing phase that is based on two well-known tools: JFlex [9] as the 
lexer generator and jacc [11] as the parser generator. JFlex defines how input streams must be 
arranged into words - called tokens - while jacc pseudocode gives rules - called productions - to 
compound such tokens. These productions are used by the parser to generate the data structure 
that contains the hierarchical representation of the process where each element is a term of 
the grammar in Definition 12.11 For example, after parsing P = a.nil + b.nil, the hierarchy 
structure obtained has on top the process variable P which contains a choice operator with a 
prefix a.nil and an urgent prefix b.nil respectively, and so on. Every element is an instance of a 
Java class that handles the respective SOS rules given in Fig. [lj thus, an element encapsulates 
both functional and temporal behaviour used to generate RTS(P) as indicated at the bottom of 

Fig. m 

The building process of RTS(P) exploits the hierarchical structure, traversing it from the root 
element; at each step the operator objects generate the proper nodes and transitions according 
to Definition 12.21 For instance, P = a.nil + b.nil will produce the node P with two outgoing 
transition a and b to the same node nil; the additional refusal transition {a} to the process 
a.nil + b.nil will be produced according to rules SlJM r , PREF r i and PREF r 2- The same method 
will be applied to the remaining nodes as expected. 

3 http : / / cosy . cs .unicam. it/f ase/ 
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Figure 2: An architectural overview of FASE. 



Such an architectural structure provides several advantages. The pseudocode of both lexer 
and parser are based on common syntaxes (such as regular expressions and BNF rules) that are 
extremely smaller than actual Java code, easier to understand and easier to maintain. Semantics 
of each operator is coded in a separate compile unit, hence it can be specified independently and 
modified in a second stage, if necessary. 



3.0.2 Performance unit 

The performance component provides all the algorithms needed to evaluate systems performance 
according to the theoretical results stated in the previous section. In particular, FASE adopts 
two new algorithms for catastrophic cycles detection and bad cycle calculation that improve 
those proposed in jl]. Moreover, FASE is also able to generate the complete set of traces that 
characterises the behaviour of the process. Such diagnostic information is useful to the user since 
it helps to understand why a modelled system produces catastrophic cycles or has certain worst- 
case performance. This feature has helped us to validate the results on the response performance 
of the three buffer implementations discussed in the next section. 



Catastrophic cycles The problem of finding catastrophic cycles in a process P has been 
solved in [3] in time 6(N 3 ) where N is the number of nodes in rRTS(P). The new algorithm 
adopted in FASE takes advantage of the well-known problem of finding the Strongly Connected 
Components (SCCs) [lj. Since an SCC of a graph is a subgraph that is strongly connected and 
maximal, the following suffices. We obtain a new graph G from rRTS(P) by deleting all edges 
labelled in and out and apply the algorithm for finding the SCCs. If at least one contains some 
time step, we can conclude that P has a catastrophic cycle. Indeed, if S is an SCC in G and 
there is a time step (u, v) with u and v nodes of S, then S has a path from v to u, i.e. it has a 
catastrophic cycle that is also contained in rRTS(P). Vice versa, if P has a catastrophic cycle, 
it is contained in some SCC of G, which therefore contains a time step. 
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Cells nodes /edges previous and new algorithm Gain 



Pipe 5 114/292 

Buff 5 96/216 

Pipe 6 272/759 

Buff 6 160/368 

Pipe 7 648/1958 

Buff 7 240/560 

Pipe 8 1544/5034 

Buff 8 336/792 



11484 
390 

620109 
1172 



37 
15 
578 
93 



16 

15 

63 

22 

296 

47 

1575 

70 

9687 
109 



89.1 % 

76.3 % 

97.4 % 
87.9 % 
99.7% 
94.0 % 
100 % 
96.2% 



74.1% 



Pipe 9 3680/12902 
Buffg 448/1064 



2922 



Table 1: Catastrophic-cycle detection time (expressed in ms) 



The standard SCC discovery algorithm has complexity 0(N + E) with N and E the number 
of nodes and edges of G respectively, and the same applies to construction of G and thus to 
finding catastrophic cycles in FASE. Table reports the running time for the original and the 
new algorithm. 

Bad cycles Next we look for a bad cycle, possibly not unique, of rRTS(P) according to 
Definition 12.61 that gives the average performance of P. To determine this value, a graph G 
is obtained from rRTS(P) by deleting all non-full time steps and all nodes not reachable any 
more (see Proof of Theorem 17 of [3] for more details). To apply a known algorithm from the 
literature, we do not look for a cycle with maximal average performance in G, but for one with 
minimal average throughput, the latter being just the inverse of the average performance. Such 
a cycle can be seen as a set of sub-paths where each one ends in a time step. 

For the known algorithm, we must transform G to a graph G' where each edge is weighted 
with some cost and represents one time step, i.e. an edge corresponds to such a sub-path. Since 
the costs should be minimal, the subpath without the last node must be a shortest path between 
the respective nodes as measured by the number of m's. Hence, one obtains a new graph Go by 
deleting all time steps in G and computes its all-pairs shortest paths matrix d with the Floyd- 
Warshall algorithm, considering a weight 1 for in-transit ions and for all the other edges. The 
final G' graph is constructed from the nodes of G on the basis of the matrix d; for every two 
nodes u, v of G, where d(u, v) is finite and there exists a time step from v to v', we add the edge 
(u,v') with cost d(u,v). This construction can be carried out in time 0(N 3 ). Now the problem 
of finding the minimal average throughput t can be solved with Karp's algorithm [8 J applied to 
graph G'. 

Although the above method is bounded by a complexity of 0(N 3 ), the construction of the 
shortest-paths matrix d has a cost of 0(iV 3 ). In a common scenario where the behaviour of P 
can be very complex, the computation of the matrix could be expensive as reported in Table 
|2j To get around the problem, we have developed an improved algorithm. Starting from G and 
Go as defined above, we reverse the edges of Go to obtain the graph Gq . Since we are only 
interested in paths leading to a time step, for each full time step (v, v') of G, we apply Dijkstra's 

4 Pipe and Buff are two different implementation of the same buffer discussed in the next section. We have left 
out Fifo since its representation is too small for sensible comparison. 
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Cells nodes /edges of G previous and new algorithm nodes /edges of G' Gain 



Pipe 5 114/292 

Buff 5 96/216 

Pipe 6 272/759 

Buff 6 160/368 

Pipe 7 648/1958 

Buff 7 240/560 

Pipe 8 1544/5034 

Buff 8 336/792 



546 

469 

4279 

1422 

15000 

7485 



12454 



62 

62 

266 

172 

1438 

437 

6672 

734 

56000 

1766 



114/3648 

96/4608 

272/17408 

160/12800 

648/82944 

240/28800 

1544/395264 

336/56448 



88.6 % 

86.7 % 
93.7 % 
87.9 % 
90.4 % 
94.1 % 
100 % 
94.1 % 
100 % 
96.0 % 



Pipe 9 3680/12648 
Buffg 448/1064 



45031 



3680/1884160 
448/100352 



Table 2: Construction time of G' 



(expressed in ms) 



algorithm to Gq with root node v and weight 1 for in-transitions, otherwise as above. Finally, 
for each node u, such that there exists a path from v, we add an edge (u,v f ) in G' where the 
cost is the length of a shortest path from v to u. 

With this approach, we calculate only those (shortest) paths that lead to time steps, i.e. 
only those paths that correspond to edges in G' . On the contrary, in the original algorithm, 
(shortest) paths between all pairs of nodes are computed. Since Dijkstra's algorithm runs in 
time 0(E+NlogN) [lj (with iV and E the number of nodes and edges respectively), constructing 
G' takes 0(N(E+NlogN)), but at least the first factor N will be considerably smaller in practice. 
Table [2] shows the improvements obtained when considering large buffer implementations. 

4 Evaluating the performance of three bounded buffer im- 
plementations 

In this section, we evaluate the worst-case efficiency of three implementations of a bounded 
buffer (of capacity N + 2, where N > 1 is a fixed natural number) with FASE. These imple- 
mentations have already been consider in [3j where their efficiency has been compared via the 
faster-than preorder relation □ defined in [6j. In particular, we want to investigate if the results 
stated in [3] still hold in our quantitative setting with the restricted class of users. The three 
implementations are Fifo (a bounded-length first-in-first-out queue), Pipe (a sequence of one 
place buffers connected end to end) and Buff (an array used in a circular fashion). Unlike [3], we 
abstract away from the actual values stored in the buffers and assume that the latter perform, 
as visible actions, either an m-action (meaning that a value is saved in a free cell of the buffer) 
or an crat-action (meaning that the buffer gives back a value to the external environment). This 
choice surely does not influence performance as already discussed in [4j, since the operations are 
data- independent, and it allows us to reduce considerably the number of states considered when 
calculating the response performance. 

The first buffer implementation Fifo shown in Fig. [3] directly implements a first-in-first-out 
queue of capacity N+2. It has no overhead in terms of internal actions and it is purely sequential. 
In the examples, we use names and defining equations (using =) to describe recursive behaviour. 
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DEST 



out 



in 



SOURCE 



Figure 3: The Software Architecture for Fifo 



Definition 4.1 (The buffer Fifo ^ We define Fifo = Fifo(O) where, for each i = 0, ■ ■ ■ N + 2, 
Fifo(i) is defined as follows: 

1. Fifo(O) = m.Fifo(l) 

2. forO <i < N + 2 then Fifo(i) = m.Fifo(i + 1) + out.F\fo(i - 1) 

3. Fifo(iV + 2) =owt.Fifo(JV+l) 

Proposition 4.2 The asymptotic performance of Fifo is 2 (i.e. Tpjf (^) = 2n + More- 
over, for any N > 1, TFifo( n ) = 2n. 

Proof: We have used FASE in order to automatically prove that Fifo does not have catas- 
trophic cycles and to calculate its asymptotic performance. For what concerns its response 
performance, we can easily see that Fifo may need a time step for any input or output. E.g. 
(AinAout) n ~ 1 AinA is an n-critical path with a maximum number of time steps. We can 
conclude that n°Fjf ( n ) = 2n. □ 

A buffer can also be implemented as a concatenation of N + 2 cells as shown in Fig. HI where 
a cell is an input/output device that contains at most one value. In such a case, the cells have 
to be connected end-to-end, so that the output of each cell becomes the input of the next one. 

Definition 4.3 (The buffer Pipe ^ We define an empty cell as the process C = in.C where 
C = out.C. Let i = 0, • • • , N + 1; the i-th cell of Pipe is defined by Ci = C[$j] where the 
relabelling function $j is defined as follows: 

{5i if a — in and < i < N 
Si_! if a = out and 1 < i < N + 1 
a otherwise 

Here each action Sj passes the value from the (j + l)-th to the j-th cell. We force synchronisation 
among two consecutive cells by properly relabelling in and out-actions of single cells. Let A = 
{5 ,S 1 ,--- ,S N+1 }. We define Pipe = (C \\ So G x \\ Sl ... \\ Sn+1 C n+1 )/A where, for any given 
P G P, the process P/A is the same as P[$a\ where $,4(0) = r if a G A and $,4(0) = a if 
a A. 

Besides input and output of values, Pipe performs a number of activities in order to manage 
the queue of cells, i.e. to move values from a cell to the next one. These actions are synchro- 
nisations between consecutive cells on actions 5i and have been made internal. Moreover, note 
that Pipe receives input values in cell N + 1 (the only m-action not renamed by functions $j is 
the one performed by this cell) and delivers output values at cell 0. 
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Figure 4: The Software Architecture for Pipe 



Proposition 4.4 The asymptotic performance of Pipe is 2. Moreover, for any N > 1, we have 
that rppjp e (n) = 2n + (N + 1). 

Sketch of the proof: Again, we have used FASE to prove that rRTS(Pipe) does not contain 
catastrophic cycles and to evaluate Pipe's asymptotic performance. We only indicate why 
rppjp e (n) = In + (N + 1). The first value is moved to cell N + 1 after one time step; with 
every further time step, it is moved to the next cell; so it arrives in cell after N + 2 time steps 
and is delivered with the next one. After the second time step, cell N + 1 becomes empty, so the 
second value is put into cell N + 1 after three time steps and then moves along the pipe with 
the same speed as the first one. Thus, the next value is always delivered after two more time 
steps; see [5] for the formal treatment of a more general case. □ 

In Fig. [5] it has been assumed that N cells are not connected end-to-end but are used as a 
storage. These cells interact with a centralised buffer controller which can store two more values 
and uses the cells in the storage as a circular queue (ordered as < 1 < ... < N — 1). In this 
case, it is the buffer controller that interacts with the external environment. More in detail, the 
buffer controller accepts a value from the external environment and then writes it in the first 
empty cell. It also reads the oldest undelivered value from the array and outputs it whenever 
possible. In the following we write a © b to denote (a + 6)mod N. 

Definition 4.5 (The buffer Buffj Let i = . . . iV — 1. The i-the cell of the storage is described 
by the process Bi = C[<J>^] where the relabelling functions are defined by 

{uji if a = in 
Pi if a = out 
a otherwise 

Here we use the action u>i (pi) to denote the writing of a value into the storage (the reading 
of a value from the storage, respectively) . Let B = {uj, pj | i — 0, . . . , N} be the set of all these 
actions and Mem = (B Q H0 . . . ||© -E>at_i). 

The state of the buffer controller, BC(x, y, i, m), is determined by four arguments: x,y G 
V = {_L, □} are used to represent the absence or presence of an input value (output value resp.) 
(see below) stored in BC, i is the index of the cell that contains the oldest undelivered value and 
m is the number of values currently stored in Mem. If x =_L the buffer controller can accept a 
new value, otherwise (i.e. if x = O) it has to wait until the last accepted value is actually stored 
in Mem. Analogously, if y = then the buffer controller is ready to produce an output and if 
y =_L no value is available for immediate output. Let x,y £ V , < i < N — 1 and < m < N. 
We define: 
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1. BC(±,±,z,0) = m.BC(D,±,i,0); 

2. m > implies BC(_L, _L, i, m) = m.BC(D, _L, i, m) + pj.BC(_L, □, % © 1, m — 1); 

3. BC(D,±,i,0) = Ui.BC(±,±,i, 1); 

^. < in < N implies BC(D, _L, i, m) = Wj® m .BC(_L, _L, i, m + 1) + pi.BC(D, □, i © 1, m — 1); 

5. BC(D,±,i,iV) = Pi.BC(n,n,zffi l,iV- 1); 

5. BC(_L, □, i, m) = in.BC(\3, □, i, m) + 0wt.BC(_L, _L, i, m); 

7. m < N implies BC(D, □, i, m) = cUj em .BC(_L, □, i, m + 1) + OMt.BC(D, _L, i, m); 

8. BC{D,n,i,N) = out.BC{D,±,i,N). 

We finally define Buff = (M em BC(_L, _L, 0, 0)) / B . Notice that in such a case all the actions 
we use to read and write values in Mem are made internal. 

Proposition 4.6 For any N > 1, we have r Pg u ff( ra ) = 4n. 

Proof: Also in this case we have used FASE to prove that rRTS(Buff) does not have catastrophic 
cycles and to evaluate its asymptotic performance. Concerning its response performance, con- 
sider first the case of one value: after a time step, the value is taken into the input part of 
BC; after another time step, it is moved into Mem; after the third time step it is moved into 
the output part of BC; after the fourth time step, it is delivered. For several values, these 
sequences can be interleaved to some degree; but since BC takes part in each action, all these 
actions are performed sequentially, and always after a time step in the worst case. E.g. for 
n = kN + m for some k > 1 and m < N, first we fill up and clear the buffer with the sequence 
((AinAr) N (ArAout) N ) k , fill it up again with a sequence (AmAr)" 1 and finally empty it with 
the sequence (ArAout)" 1 ^ 1 At A. All paths in that form (up to permutations) are n-critical 
paths with the maximum number of time steps that is 4Nk + 2m + 2(m — 1) + 2 = An. □ 

Now we can state the main result of this paper. This follows as a straightforward consequence 
of Propositions I4.2[ 14.41 and 14.61 

Corollary 4.7 For any N > 1 , Fifo is more efficient than both Pipe and Buffer. £. the 

N + 1 

quantitative point of view). Moreover, Buff is more efficient than Pipe iff n < |_ J. 
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5 Concluding remarks 



The results obtained with our tool are quite different from those presented in [3] where the 
same buffer implementations have been compared using the efficiency preorder defined in |6j. 
In [3] it is stated that Fifo and Pipe are unrelated according to the worst-case efficiency preorder 
(unrelated means that the former process is not more efficient than the second one and vice 
versa). Similarly Buff and Pipe are unrelated. The authors provide good reasons for these 
results and also prove that Fifo is more efficient than Buff but not vice versa. 

As already stated in the introduction, the efficiency preorder is based on arbitrary test envi- 
ronments, whereas we have only used restricted environments adequate for quantitative reasoning 
in this paper. To explain the results of [3], we consider the refusal trace v = in A0 out {in} G 
RT(Fifo)\RT(Pipe), which can be understood as a witness of slow behaviour of Fifo, justifying 
Fifo 2 Pipe. This trace tells us that Fifo can perform two time steps after an in provided the 
environment does not offer a communication after the first one (Fifo itself would neither block 
in nor out); then it can deliver the value and can now delay in (as after any visible action). Now 
we show that none of our users can be such a suitable context, i.e. that Fifo cannot participate 
in such a discrete trace v when running in parallel with a user U n ; hence, v is not relevant for 

r ^Fifo- 

Fifo || U n Fifo(l) || (£/ n _i \\{,„}Out.uj) — > r P' = Fifo(l) || (Z7 n _i \\{,„}Out.u) 

Here, Fifo(l) = (m.Fifo(2)+oitt.Fifo(0)) can perform -^> r to itself; but by the refusal semantics 

we could have P' — >■ only if (U n -\ \\^,,} Out .u) is able to refuse both in and out. And this is clearly 
not the case. We are currently working on this qualitative/ quantitative issue by defining a slight 
variation of the faster than preorder as given in [6 J to relate processes w.r.t. the restricted class 
of tests U as in [1] but by some variant of refusal trace inclusion. 

Our aim is to tune FASE to allow the analysis of larger systems, where the performance module 
needs more attention since it implements the theories introduced above. A first important result, 
we have already obtained, is the improvement of the catastrophic-cycles detection; ensuring their 
absence is the basis for any further performance analysis. A second result regards the calculation 
of the bad cycle, especially when we consider complex processes. However, the graph G' used 
in Karp's algorithm could be very large, and we will investigate ways to minimise it. We are 
also working on a good strategy to determine the response performance of P for a given n. 
Different approaches are under investigation but they still need to be validated. Currently, FASE 
executes an exhaustive search on rRTS(P) that looks for the n-critical path whose duration is 
maximal; clearly as n increases this solution becomes soon intractable, especially for complex 
processes. Even though it is a rough solution, at least it helped to validate the results on response 
performance presented in the above propositions. 

Anyhow, FASE represents a good first step towards the creation of an integrated framework 
for the analysis of concurrent systems modelled through PAFAS. The improvements introduced 
with FASE and the possibility to derive the complete set of behavioural traces of the modelled 
system allowed us to study and validate many results, such as the ones stated in this paper, that 
would have been harder to calculate without an automated tool like FASE. 
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